Clustering of Multiple Microarray Experiments Using Information Integration
نویسندگان
چکیده
In this article, we study two microarray data integration techniques and describe how they can be applied and validated on a set of independent, but biologically related, microarray data sets in order to derive consistent and relevant clustering results. First, we present a cluster integration approach, which combines the information containing in multiple data sets at the level of expression or similarity matrices, and then applies a clustering algorithm on the combined matrix for subsequent analysis. Second, we propose a technique for the integration of multiple partitioning results. The performance of the proposed cluster integration algorithms is evaluated on time series expression data using two clustering algorithms and three cluster validation measures. We also propose a modified version of the Figure of Merit (FOM) algorithm, which is suitable for estimating the predictive power of clustering algorithms when they are applied to multiple expression data sets. In addition, an improved version of the well-known connectivity measure is introduced to achieve a more objective evaluation of the connectivity performance of clustering algorithms.
منابع مشابه
Integration and Reduction of Microarray Gene Expressions Using an Information Theory Approach
The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملDetermination of the Minimum Sample Size in Microarray Experiments to Cluster Genes Using K-means Clustering
Gene expression profiles obtained from time-series microarray experiments can reveal important information about biological processes. However, conducting such experiments is costly and time consuming. The cost and time required are linearly proportional to sample size. Therefore, it is worthwhile to provide a way to determine the minimal number of samples or trials required in a microarray exp...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کامل